About the Role

Elile builds mission-critical AI systems for national security and energy operations. Our engineering challenges sit at the intersection of AI, distributed systems and data pipelines. We are looking for a hands-on Technical Lead who can lead by example the design and engineering execution, working closely with product and domain experts to ship systems that must be reliable, secure, and performant at scale.

What You Will Lead

Architecture & System Design

  • Design scalable and fault-tolerant architectures for AI-driven industrial platforms, including LLM pipelines, inference systems, and monitoring systems
  • Lead end-to-end system architecture for on-premise deployments, edge compute environments, and GPU-backed clusters
  • Define robust data models, API contracts, and integration patterns for high-volume industrial data sources
  • Ensure system reliability, performance optimization, and high availability across distributed environments
  • Collaborate with cross-functional teams to align architecture decisions with business and operational requirements

Technical Leadership

  • Mentor and guide cross-functional engineering teams, including AI Engineers, Full Stack Engineers, and DevOps Engineers
  • Translate business and product requirements into clear technical roadmaps with defined milestones and deliverables
  • Review codebases, enforce engineering best practices, and ensure architectural consistency across teams
  • Drive build-vs-buy decisions by evaluating open-source and commercial solutions
  • Ensure platform security, compliance, and adherence to industry standards

AI & Intelligence Systems

  • Collaborate with AI engineers to train and deploy open LLMs, Retrieval-Augmented Generation (RAG) systems, domain-specific models, and agentic workflows
  • Architect scalable data pipelines for OSINT, SIGINT, OPSINT ingestion, real-time turbine telemetry, and industrial sensor networks
  • Design intelligent systems for processing high-volume, real-time, and batch data streams
  • Optimize inference pipelines, caching strategies, and dataflows for low-latency industrial applications
  • Ensure AI systems are production-ready, scalable, and reliable in edge and on-prem environments

Platform Reliability & Observability

  • Ensure platforms meet strict uptime and reliability requirements across production environments
  • Design and maintain monitoring, alerting, and observability stacks for distributed systems
  • Lead incident response processes and guide root-cause analysis for system failures
  • Support debugging and performance tuning of complex distributed architectures
  • Implement role-based access control (RBAC), encryption, and end-to-end data governance

Cross-Functional Collaboration

  • Work closely with Product, AI Research, and Energy/Intelligence domain experts to align technical solutions with business goals
  • Break down complex system requirements into clear, actionable engineering tasks
  • Coordinate with internal and external stakeholders, including customers, partners, and system integrators
  • Own and drive technical communication across teams and leadership
  • Support pre-sales, solution design discussions, and technical reviews when required

What You Bring

Technical Foundations

  • 4+ years of experience in software engineering, with at least 2 years in a technical leadership or system architecture role
  • Strong command of Python, Go, or Node.js, with experience designing and scaling large backend systems
  • Deep understanding of:
    • Distributed systems and microservices architectures
    • Experience designing and operating high-availability, fault-tolerant systems
    • Hands-on experience with object storage platforms such as Amazon S3, MinIO, or Azure Blob Storage
    • Strong knowledge of containerization and orchestration technologies, including Docker and Kubernetes

AI & Data Experience (Preferred)

  • Hands-on experience working with Large Language Models (LLMs), vector databases, Retrieval-Augmented Generation (RAG), or ML pipelines
  • Familiarity with frameworks and tools such as LangGraph and LangSmith
  • Strong understanding of chunking strategies, embeddings, model tuning, and prompt engineering
  • Experience designing and implementing agentic workflows for AI-driven systems
  • Experience deploying and optimizing models on GPU-backed infrastructure is a strong plus

Domain Knowledge (Bonus, Not Required)

  • Familiarity with energy systems, turbines, and industrial platforms, including SCADA, OMS, and EMS integrations
  • Understanding of cybersecurity principles, including zero-trust architectures and secure system design
  • Exposure to OSINT and SIGINT data pipelines, threat intelligence platforms, or intelligence systems
  • Experience with real-time monitoring, automation, and control platforms

What Success Looks Like

Within your first 30-90-180 days, you will have:

  • Designed and deployed large-scale, AI-native features across Elile products
  • Improved engineering velocity through clearer system architecture and streamlined development processes
  • Led engineering teams through complex technical challenges to deliver robust and secure systems
  • Directly contributed to Elile’s mission of building autonomous, intelligent infrastructure for the region